In this work, a multi-task learning model for age, gender and emotion recognition on edge processing is developed. The multi-task model is based on the backbone of MobileNetV2 in which the three last layers are customized to have three outputs for age, gender and emotion. The model was trained and tested on a dataset which is the combination of the well-known dataset, namely IMDB and our self-collected dataset. The trained model is then optimized and quantized to be implemented on NPU of the chip RK3588 from Rockchip on Orange PI plus hardware platform. Experimental evaluation on several testcase was performed. It is known that the multi-task model outputs prediction accuracy as high as single-task model while significantly reducing computational processing requirements. On Orange PI platform, the highest prediction accuracy for age, gender and emotion are 3.485 MAE, 98.281% and 93.917%, respectively. The computational performance reaches 285.7 FPS as the highest. These results have a high potential for many practical applications on edge devices.
Keyword
Age, Gerder and Emotion Recognition, Multi-task Learning, NPUs, Edge Processing